Accelerated gradient descent

Initialize starting vector 𝐯(0)=𝐲(1)=𝐳(1)\mathbf{v}^{(0)} = \mathbf{y}^{(1)} = \mathbf{z}^{(1)}. For t=1,...,Tt = 1,...,T, compute

Let ff be alpha-strongly convex and beta-smooth, then, running accelerated gradient descent for TT steps, output x̂\hat{x} satisfies, with κ=βα\kappa = \frac{\beta}{\alpha} ,

f(𝐱(T))f(𝐱*)κeT/κ[f(𝐱(0))f(𝐱*)]f(\mathbf{x}^{(T)}) - f(\mathbf{x}^*) \leq \kappa e^{-T/\sqrt{\kappa}}[f(\mathbf{x}^{(0)})-f(\mathbf{x}^*)]


Complexity analysis:

Suppose ff strongly convex such that mI2f(x)MI,for all xmI \preceq \nabla^2 f(x) \preceq MI, \quad \text{for all } x Recall that setting t=1/Mt = 1/M, f(x(k))p*(11κ)k(f(x(0)p*)f(x^{(k)}) - p^* \leq (1- \frac{1}{\kappa})^k (f(x^{(0)} - p^*) where κ=M/m\kappa = M/m, and p*p^* is the minimum value of ff.

#incomplete


References:

  1. https://web.archive.org/web/20210302210908/https://blogs.princeton.edu/imabandit/2013/04/01/acceleratedgradientdescent/
  2. https://www.chrismusco.com/amlds2023/lectures/lec8_annotated.pdf
  3. https://www.chrismusco.com/amlds2023/notes/lecture08.html#Accelerated_Gradient_Descent
  4. Yurii Nesterov - Introductory Lectures on Convex Optimization, pgs. 66-68, 81. https://link.springer.com/book/10.1007/978-1-4419-8853-9
  5. Yurii Nesterov - Lectures on Convex Optimization